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Abstract 

We show that more head-driven parsing algorithms can 
be formulated than those occurring in the existing lit- 
erature. These algorithms are inspired by a family of 
left-to-right parsing algorithms from a recent publica- 
tion. We further introduce a more advanced notion of 
"head-driven parsing" which allows more detailed spec- 
ification of the processing order of non-head elements 
^~»in the right-hand side. We develop a parsing algorithm 
2 for this strategy, based on LR parsing techniques. 



^1" 



Introduction 



According to the head-driven paradigm, parsing of a 
formal language is started from the elements within the 
input string that are most contentful either from a syn- 
tactic or, more generally, from an information theoretic 
point of view. This results in the weakening of the 
left-to-right feature of most traditional parsing meth- 
ods. Following a pervasive trend in modern theories of 
\l Grammar (consider for instance [5, 3, 11]) the compu- 
tational linguistics community has paid large attention 
(3J[)to the head-driven paradigm by investigating its appli- 

cations to context-free language parsing. 
O | Several methods have been proposed so far exploit- 
ed ing some nondeterministic head-driven strategy for 
H context-free language parsing (see among others [6, 13, 
2, 14]). All these proposals can be seen as general- 
izations to the head-driven case of parsing prescrip- 
tions originally conceived for the left-to-right case. The 
methods above suffer from deficiencies that are also no- 
ticeable in the left-to-right case. In fact, when more 
rules in the grammar share the same head element, or 
share some infix of their right-hand side including the 
head, the recognizer nondeterministically guesses a rule 
just after having seen the head. In this way analyses 
that could have been shared are duplicated in the pars- 
ing process. 

Interesting techniques have been proposed in the left- 
to-right deterministic parsing literature to overcome re- 
dundancy problems of the above kind, thus reducing 
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the degree of nondeterminism of the resulting methods. 
These solutions range from predictive LR parsing to LR 
parsing [15, 1]. On the basis of work in [8] for nonde- 
terministic left-to-right parsing, we trace here a theory 
of head-driven parsing going from crude top-down and 
head-corner to more sophisticated solutions, in the at- 
tempt to successively make more deterministic the be- 
haviour of head-driven methods. 

Finally, we propose an original generalization of head- 
driven parsing, allowing a more detailed specification of 
the order in which elements of a right-hand side are to 
be processed. We study in detail a solution to such 
a head-driven strategy based on LR parsing. Other 
methods presented in this paper could be extended as 
well. 

Preliminaries 

The notation used in the sequel is for the most part 
standard and is summarised below. 

Let D be an alphabet (a finite set of symbols); D + 
denotes the set of all (finite) non-empty strings over D 
and D* denotes D + U {e}, where e denotes the empty 
string. Let R be a binary relation; R + denotes the 
transitive closure of R and R* denotes the reflexive and 
transitive closure of R. 

A context-free grammar G = (N , T, P, S) consists of 
two finite disjoint sets N and T of nonterminal and 
terminal symbols, respectively, a start symbol S 6 N , 
and a finite set of rules P. Every rule has the form 
A — ► a, where the left-hand side (lhs) A is an element 
from N and the right-hand side (rhs) a is an element 
from V + , where V denotes (N U T). (Note that we 
do not allow rules with empty right-hand sides. This 
is for the sake of presentational simplicity.) We use 
symbols A, B, C, . . . to range over N, symbols X, Y, Z 
to range over V, symbols a, [3, j, . . . to range over V* , 
and v, w, x, ... to range over T*. 

In the context-free grammars that we will consider, 
called head grammars, exactly one member from each 
rhs is distinguished as the head. We indicate the head 
by underlining it, e.g., we write A — ► aXj3. An expres- 
sion A — ► ay/3 denotes a rule in which the head is some 
member within j. We define a binary relation O such 



that B O A if and only if A — ► aBJ3 for some a and [3. 
Relation O* is called the head-corner relation. 

For technical reasons we sometimes need the aug- 
mented set of rules P\ consisting of all rules in P plus 
the extra rule 5" —> J-S, where 5" is a fresh nontermi- 
nal, and _L is a fresh terminal acting as an imaginary 
zeroth input symbol. The relation ft is extended to a 
relation — ► on V* x V* as usual. We write y 6 when- 
ever -y — ^ <5 holds as an extension of p £ pt. We write 

7 > b if 7 — ► d»i —> b 2 ■ ■ ■ o s _i — > 

For a fixed grammar, a head-driven recognition algo- 
rithm can be specified by means of a stack automa- 
ton A = (T, Alph, Init(n),t—>, Fin(n)), parameterised 
with the length n of the input. In A, symbols T 
and Alph are the input and stack alphabets respec- 
tively, Init(n), Fm(n) £ Alph are two distinguished 
stack symbols and 1— ► is the transition relation, defined 
on Alph + x Alph + and implicitly parameterised with the 
input. 

Such an automaton manipulates stacks T £ Alph + , 
(constructed from left to right) while consulting the 
symbols in the given input string. The initial stack 
is Init(n). Whenever r h> T' holds, one step of the 
automaton may, under some conditions on the input, 
transform a stack of the form T"T into the stack T"T' . 
In words, r h> T' denotes that if the top-most few sym- 
bols on the stack are T then these may be replaced by 
the symbols V . Finally, the input is accepted whenever 
the automaton reaches stack Fm(n). Stack automata 
presented in what follows act as recognizers. Parsing 
algorithms can directly be obtained by pairing these 
automata with an output effect. 

A family of head-driven algorithms 

This section investigates the adaptation of a family of 
left-to-right parsing algorithms from [8], viz. top-down, 
left-corner, PLR, ELR, and LR parsing, to head gram- 
mars. 

Top-down parsing 

The following is a straightforward adaptation of top- 
down (TD) parsing [1] to head grammars. 

There are two kinds of stack symbol (items), one of 
the form [i,A, j], which indicates that some subderiva- 
tion from A is needed deriving a substring of a;+i . . . a j , 
the other of the form [i,k,A—>a»y» [3, m, j], which 
also indicates that some subderivation from A is needed 
deriving a substring of a;+i . . .a,j, but specifically using 
the rule A — ► ay/3, where 7 — >■* a^+i ■ ■ .a m has already 
been established. Formally, we have 

I™ = {[i,A,j] I i<j} 

I™ = {[i, k, A -> a • 7 • (], m, j] I A -> ay]3 £ Pt A 

i < k < m < j} 

Algorithm 1 (Head- driven top-down) 

a td _ ( Tj jTD y jTD^ Imi ( n ^ ^ Fin(n)), where 

Intt(n) = [-1, -1, 5" • _L • S, 0, n], 



Fm(n) = [—1, —1, S' —>■• J-S •, n, n], and the transition 
relation 1— ► is given by the following clauses. 

[i,A,j]~[i,A,j][i,B,j] 
where there is A — ► aBj3 £ pt 

0a [i, k, A — ► a • 7 • Bf3, m, j] 1— ► 

[i,k,A^ a • 7 • Bf3,m,j][m,B,j] 

Ob [i, k, A — ► aB • 7 • [3, m, j] 1— ► 
[i, k,A^aB»y»f3, m,j][i, B, k] 

1 [i, A,j]t—>[i,k — l,A—>a»a»/3,k, j] 

where there are A — ► aa/3 £ pt and k such that 
i < k < j and = a 
2a [i, k, A — ► a • 7 • a/3, m, j] 1— ► 
[i,k, A —>■ a • ya • /3,m + 1, j] 
provided m < j and a m+ i = a 
2b Symmetric to 2a (cf. 0a and Ob) 
3 [i,A,j][i',k,B 6 ;m,j'] h-> 
[i,k, A —>■ a • B • [3, m, j] 
where there is A — ► aB_/3 £ pt 
(i = i' and j = j' are automatically satisfied) 
4a [i, fc, A — ► a • 7 • 5/3, m, j][i', B — ► •(*>•, m', j'] 1— ► 
[i, fc, A a • 75 • f3,m' , j] 
provided m = k' 

(m = i' and j = j' are automatically satisfied) 
4b Symmetric to 4a 

We call a grammar head-recursive if A + A for some 
A. Head-driven TD parsing may loop exactly for the 
grammars which are head-recursive. Head recursion is a 
generalization of left recursion for traditional TD pars- 
ing. 

In the case of grammars with some parameter mech- 
anism, top-down parsing has the advantage over other 
kinds of parsing that top-down propagation of parame- 
ter values is possible in collaboration with context-free 
parsing (cf. the standard evaluation of definite clause 
grammars), which may lead to more efficient process- 
ing. This holds for left-to-right parsing as well as for 
head-driven parsing [10]. 

Head-corner parsing 

The predictive steps from Algorithm 1, represented by 
Clause and supported by Clauses 0a and Ob, can be 
compiled into the head-corner relation O* . This gives 
the head-corner (HC) algorithm below. The items from 
I^ D are no longer needed now. We define I HC = I^ ■ 

Algorithm 2 (head-corner) 

a hc _ jifc j mi ( n ^ ^ Fm(n)), where 

Intt(n) = [-1, -1, 5" ^ • _L • S, 0, n], 

Fm(n) = [—1,-1,5" —>■ • JS •,n,n], and 1— ► is given 

by the following clauses. (Clauses lb, 2b, 3b, 4b are 

omitted, since these are symmetric to la, 2a, 3a, 4a, 

respectively.) 

la [i, k, A — ► a • 7 • Bf3, m, j] 1— ► 

[i,k,A—> a • 7 • B [3 , m, j][m, p—l, C —> r)»a»6, p, j] 

where there are C — ► r/a6 £ ft and p such that m < 
p < j and a p = a and CO* B 



2a [i,k, A —>■ a • j • afa m, j] h- ► 
[i,k, A —>■ a • ya • fam + 1, j] 

provided m < j and a m+ i = a 
3a [i,k, D —> ocyAfam, ,k' , B —> •6»,m' t—> 
[i, k,D —> a»yAfa m,j][i', k' , C —> r)»B»6, m',j'] 
provided m = i' , where there is C — ► r\BB G ft such 
that C O* A 

(j = j' is automatically satisfied) 
4a [i, k, A — ► a • y • 5/3, m, j][«', k' , B — ► •6»,m' t—> 
[i, k, A — ► a • 75 • fam' , j] 
provided m = k' 

(m = i' and j = j' are automatically satisfied) 

Head-corner parsing as well as all algorithms in the 
remainder of this paper may loop exactly for the gram- 
mars which are cyclic (where A A for some A). 

The head-corner algorithm above is the only one in 
this paper which has already appeared in the literature, 
in different guises [6, 13, 2, 14]. 

Predictive HI parsing 

We say two rules A — ► u\ and B — ► 0.2 have a common 
infix a if a\ = faaji and = faay2, for some fa, fa, 
71 and 72 . The notion of common infix is an adaptation 
of the notion of common prefix [8] to head grammars. 

If a grammar contains many common infixes, then 
HC parsing may be very nondeterministic; in particular, 
Clauses 1 or 3 may be applied with different rules C —> 
r/a9 G -Pt or C —> r/B_9 G ft for fixed a or B. 

In [15] an idea is described that allows reduction of 
nondeterminism in case of common prefixes and left- 
corner parsing. The resulting algorithm is called pre- 
dictive LR (PLR) parsing. The following is an adapta- 
tion of this idea to HC parsing. The resulting algorithm 
is called predictive HI (PHI) parsing. (HI parsing, to 
be discussed later, is a generalization of LR parsing to 
head grammars.) 

First, we need a different kind of item, viz. of the 
form [i,k,A — ► j,m,j], where there is some rule A — ► 
ay fa With such an item, we simulate computation of 
different items [i, k, A — ► a • 7 • fam,j] G I HC , for 
different a and fa which would be treated individually 
by an HC parser. Formally, we have 

I PHI = {[i, k,A^y, m,j] \ A —> ay/3 G P f A 

i < k < m < j} 

Algorithm 3 (Predictive HI) 

a phi _ ( Tj jPHi^ Init ( n ^ ^ Fin(n)), where 

Init(n) = [-1,-1,5" _L,0,n], 

Fm(n) = [—1,-1,5" —> _L5, n,n], and 1— ► is given by 
the following (symmetric "b-clauses" omitted). 

la [i,k,A^y,m,j] h-> 

[i,k,A^ y,m,j][m,p- l,C -> a,p,j] 

where there are C — ► r/aO, A — ► ayB/3 G Pt and p 

such that m < p < j and a p = a and CO* fl 
2a [i,k,A^y,m,j] h-> [i, k , A -> ya,m + 

provided m < j and a m+ i = a, where there is A — ► 

aya/3 G Pt 



3a [i, fc, D — ► 7, m, j][i', B ^ 8, m' , j'] 1— ► 
[i,fc, J D^7,m,i][i',fc',C7^5,m',i'] 
provided m = i' and 5 — ► 5 G Pt, where there are 
P> ay A fa C r]B6 G P f such that CO'/l 

4a [i, k,A—*y, m,j][i', k' , B ^ 8, m' , j'] h-> 
[i,fc,A^75,m',j] 
provided m = k' and 5 — ► 5 G Pt, where there is 
A -> ayB/3 G P f 

Extended HI parsing 

The PHI algorithm can process simultaneously a com- 
mon infix a in two different rules A — ► faaji and 
A —> faorf2, which reduces nondeterminism. 

We may however also specify an algorithm which suc- 
ceeds in simultaneously processing all common infixes, 
irrespective of whether the left-hand sides of the cor- 
responding rules are the same. This algorithm is in- 
spired by extended IR (ELR) parsing [12, 7] for ex- 
tended context-free grammars (where right-hand sides 
consist of regular expressions over V). By analogy, it 
will be called extended HI (EHI) parsing. 

This algorithm uses yet another kind of item, viz. 
of the form [i, k, {A\, A2, ■ ■ ■ , A p } — ► y,m,j], where 
there exists at least one rule A — ► ay [3 for each 
A G {A\, A2, ■ ■ ■ , A p }. With such an item, we simu- 
late computation of different items [i,k,A—>a»y» 
fam,j] G I HC which would be treated individually by 
an HC parser. Formally, we have 

I EHI = {[i,k,A^y,m,j] I 

C A C {A I A ayp G P^} A 

1 < k < m < j} 

Algorithm 4 (Extended HI) 

a ehi _ jEiii^ i mi ( n ^ M . j Ftn(n)), where 

Init(n) = [-1, -1, {5'} ^ ±, 0, n], 

Fm(n) = [—1, —1, {5'} —> _L5, n, n], and is given by: 

la [i, k,A^y, m,j] h-> 

[i,k,A^ y,m,j][m,p- 1, A' a,p,j] 

where there is p such that m < p < j and a p = a 

and A' = {C* I 3C r]a6,A ayB/3 G P\A G 
AACO* B)} is not empty 

2a [i, k,A^y, m,j] h-> [i, fc, A' 7a, m + 1, j] 

provided m < j and a m+ i = a and A' = {A G 
A I A — ► aya/3 G ft} i s not empty 

3a [i, fc, A 7, m,j][i', k' , A' (5, m',j'] h-> 
[i,fc,A^ 7 ,m,i][i',fc',A"^5,m',i'] 
provided m = i' and 5 5 G P t for some 5 G A' 
such that A" = {C I 3C* ry56», D ay A/3 G 

P HD eAACO* i)} is not empty 
4a [i, k,A—?y, m, , fc', A' —> 8, m',j'} » 
[i,k,A" ^yB,m',j] 
provided m = k 1 and B — ► 5 G P t for some 5 G A' 
such that A" = {A G A | A ayB/3 G ?t} ; s not 
empty 



This algorithm can be simplified by omitting the 
sets A from the items. This results in common infix 
( CI) parsing, which is a generalization of common pre- 
fix parsing [8]. CI parsing does not satisfy the correct 
subsequence property, to be discussed later. For space 
reasons, we omit further discussion of CI parsing. 

HI parsing 

If we translate the difference between ELR and LR pars- 
ing [8] to head-driven parsing, we are led to HI parsing, 
starting from EHI parsing, as described below. The al- 
gorithm is called HI because it computes head-inward 
derivations in reverse, in the same way as LR parsing 
computes rightmost derivations in reverse [1]. Head- 
inward derivations will be discussed later in this paper. 

HI parsing uses items of the form [i, k, Q, m, j], where 
Q is a non-empty set of "double-dotted" rules A — ► a • 
7 • [3. The fundamental difference with the items in 
jEHi j g th a t the infix 7 in the right-hand sides does not 
have to be fixed. Formally, we have 

I HI ={[i,k,Q,m,j] I 

C Q C {A a • 7 • (3 I A aj/3 £ P*} A 

1 < k < m < j} 

We explain the difference in behaviour of HI parsing 
with regard to EHI parsing by investigating Clauses la 
and 2a of Algorithm 4. (Clauses 3a and 4a would give 
rise to a similar discussion.) Clauses la and 2a both ad- 
dress some terminal a p , with m < p < j . In Clause la, 
the case is treated that a p is the head (which is not 
necessarily the leftmost member) of a rhs which the al- 
gorithm sets out to recognize; in Clause 2a, the case is 
treated that a p is the next member of a rhs of which 
some members have already been recognized, in which 
case we must of course have p = m + 1. 

By using the items from I HI we may do both kinds 
of action simultaneously, provided p = m + 1 and a p is 
the leftmost member of some rhs of some rule, where 
it occurs as head. 1 The lhs of such a rule should sat- 
isfy a requirement which is more specific than the usual 
requirement with regard to the head-corner relation. 2 
We define the left head-corner relation (and the right 
head-corner relation, by symmetry) as a subrelation of 
the head-corner relation as follows. 

We define: B L A if and only if A — ► Ba for some 
a. The relation Z* now is called the left head-corner 
relation. 

We define 

gotoright 1 (Q, X) = 

{C^r]»X»9\C^ r)X0 £ Pt A 

3A->a»~fBpEQ{C O* B)} 

gotoright 2 (Q, X) = 

1 I{ a p is not the leftmost member, then no successful 
parse will be found, due to the absence of rules with empty 
right-hand sides (epsilon rules). 

Again, the absence of epsilon rules is of importance here. 



{c -> »x »e \c -> if e pt a 

3A a • 7 • B/3 £ Q(C I* B)} U 
{A a • jX • /? I A a • 7 • X/3 £ Q} 

and assume symmetric definitions for gotoleft 1 and 
gotoleft 2 . 

The above discussion gives rise to the new Clauses la 
and 2a of the algorithm below. The other clauses are 
derived analogously from the corresponding clauses of 
Algorithm 4. Note that in Clauses 2a and 4a the new 
item does not replace the existing item, but is pushed 
on top of it; this requires extra items to be popped off 
the stack in Clauses 3a and 4a. 3 

Algorithm 5 (HI) 

A HI = (T, I HI , Init(n), 1— Fin(n)), where 

Intt(n) = [-1, -1, {5" ^ • _L • S}, 0, n], 

Fm(n) = [—1, —1, {5" —> • -L5 1 •}, n, n], and 1— ► defined: 

la [i,k,Q,m,j] h-> [i, k, Q, m,j] [m,p- l,Q',p,j] 

where there is p such that m + 1 < p < j and a p = a 
and Q' = gotoright 1 (Q, a) is not empty 

2a [i, k, Q, m,j] h-> [i, k, Q, m,j][i, k, Q' , m + 

provided m < j and a m+ i = a and Q' = 
gotoright 2 (Q, a) is not empty 

3a [i, k, Q, m,j]h ■ ■ -Ir-i[i', k' , Q', m',j'] h-> 
[i, k , Q , m, j][i' , k' , Q" , m' , f] 
provided m < k' , where there is B — ► • X\ . . .X r • 
£ Q' such that Q" = gotoright 1 (Q, B) is not empty 

4a [i, k, Q, m,j]h ■ ■ -Ir-i[i', k' , Q' , m',j'] h-> 
[i,k,Q,m,j][i,k,Q",m',j] 
provided m = k' or k = k' , where there is B — ► • 
X\ . . . X r • £ Q' such that Q" = gotoright 2 (Q, B) is 
not empty 

We feel that this algorithm has only limited advan- 
tages over the EHI algorithm for other than degenerate 
head grammars, in which the heads occur either mostly 
leftmost or mostly rightmost in right-hand sides. In 
particular, if there are few sequences of rules of the form 
A —> Aj_ai,Ai —> A 2 a 2 , . . .,A m _i —> A m a m , or of 
the form A —> a x Ax, A x —> 0L2M, ■ ■ ■ , A m _i — > a m A m , 
then the left and right head-corner relations are very 
sparse and HI parsing virtually simplifies to EHI pars- 
ing. 

In the following we discuss a variant of head gram- 
mars which may provide more opportunities to use the 
advantages of the LR technique. 

A generalization of head grammars 

The essence of head-driven parsing is that there is a 
distinguished member in each rhs which is recognized 
first. Subsequently, the other members to the right and 
to the left of the head may be recognized. 

An artifact of most head-driven parsing algorithms is 
that the members to the left of the head are recognized 

3 ii . . . I r -i represent a number of items, as many as there 
are members in the rule recognized, minus one. 



strictly from right to left, and vice versa for the mem- 
bers to the right of the head (although recognition of 
the members in the left part and in the right part may 
be interleaved). This restriction does not seem to be 
justified, except by some practical considerations, and 
it prevents truly non-directional parsing. 

We propose a generalization of head grammars in 
such a way that each of the two parts of a rhs on both 
sides of the head again have a head. The same holds 
recursively for the smaller parts of the rhs. The con- 
sequence is that a rhs can be seen as a binary tree, in 
which each node is labelled by a grammar symbol. The 
root of the tree represents the main head. The left son 
of the root represents the head of the part of the rhs to 
the left of the main head, etc. 

We denote binary trees using a linear notation. For 
example, if a and [3 are binary trees, then (a)X(p) 
denotes the binary tree consisting of a root labelled X , 
a left subtree a and a right subtree [3. The notation of 
empty (sub)trees (e) may be omitted. The relation — >■* 
ignores the head information as usual. 

Regarding the procedural aspects of grammars, gen- 
eralized head grammars have no more power than tra- 
ditional head grammars. This fact is demonstrated by 
a transformation Th eac i from the former to the latter 
class of grammars. A transformed grammar Th ea d(G) 
contains special nonterminals of the form [a], where a 
is a proper subtree of some rhs in the original gram- 
mar G = (T, N , P, S). The rules of the transformed 
grammar are given by: 

A->[a]X [/?] for each A (a)X(f]) £ P 
[(a)X(p)] — ► [a] X_ [/?] for each proper subtree 

(a)X(/3) of a rhs in G 

where we assume that each member of the form [e] in 
the transformed grammar is omitted. 

It is interesting to note that Thead is a generalization 
of a transformation T two which can be used to transform 
a context-free grammar into two normal form (each rhs 
contains one or two symbols). A transformed grammar 
Ttwo(G) contains special nonterminals of the form [a], 
where a is a proper suffix of a rhs in G. The rules of 
Ttwo(G) are given by 

A —> X [a] for each A —> Xa £ P 
[Xa] —> X [a] for each proper suffix Xa of a rhs in G 

where we assume that each member of the form [e] in 
the transformed grammar is omitted. 

HI parsing revisited 

Our next step is to show that generalized head gram- 
mars can be effectively handled with a generalization 
of HI parsing (generalized HI (GHI) parsing). This 
new algorithm exhibits a superficial similarity to the 
2-dimensional LR parsing algorithm from [16]. For a 
set Q of trees and rules, 4 closure(Q) is defined to be 

4 It is interesting to compare the relation between trees 
and rules with the one between kernel and nonkernel items 
of LR parsing [1]. 



the smallest set which satisfies 

closure(Q) D Q U 

{A {a)X(P) £ P | (j)A(6) £ closure(Q) V 

B (j)A(8) £ closure(Q)} 

The trees or rules of which the main head is some 
specified symbol X can be selected from a set Q by 

goto(Q,X) = {t £Q\t = (a)X(p)Vt =A^ (a)X(p)} 

In a similar way, we can select trees and rules according 
to a left or right subtree. 

gotoleft(Q, a) = {t £ Q \ t = (a)X(p) V 

t = A (a)X(p)} 

We assume a symmetric definition for gotoright. 

When we set out to recognize the left subtrees from 
a set of trees and rules, we use the following function. 

left(Q) = closure({a \ (a)X(f]) £ Q V 

A -+ (a)X(p) £ Q}) 

We assume a symmetric definition for right. 
The set J GHI contains different kinds of item: 

• Items of the form [i, k, Q, m, j], with i < k < m < j , 
indicate that trees (a)X(p) and rules A — ► (a)X(p) 
in Q are needed deriving a substring of a;+i . . .a,j, 
where X —±* a^+i ■ ■ ■ a m has already been estab- 
lished. 

• Items of the form [k, Q, m,j], with k < m < j , indi- 
cate that trees (a)X(p) and rules A — ► (a)X(p) in Q 
are needed deriving a substring of a^+i ... cij , where 
aX cik+i ■ ■ .a m has already been established. 
Items of the form [i,k,Q,m] have a symmetric mean- 
ing. 

• Items of the form [k,t, m], with k < m, indicate that 
7 —±* cik+i ■ ■ ■ a m has been established for tree t = j 
or rule t = A — ► j. 

Algorithm 6 (Generalized HI parsing) 

a ghi _ johi^ i mi ( n ^ ^ Ftn(n)), where 

Init{n) = [-1, {5" ^ MS)}, 0, n], 

Fm(n) = [—1, 5" —> -L(S), n], and h- ► defined: 

la [i, k, Q, m, j] i-> [i, k, Q' , m] 

provided Q' = gotoright(Q, e) is not empty 
lb [i, k, Q, m,j] h-> [k, Q' , m,j] 

provided Q' = gotoleft(Q, e) is not empty 
lc [k,Q,m, [k,t,m] 

provided t £ gotoright(Q, e) 
Id [i, k, Q, m] h- ► [k, t, m] 

provided t £ gotoleft(Q, e) 
2a [i,k,Q,m,j] ^ [i,k,Q,m,j][m,p- l,Q',p,j] 

where there is p such that m < p < j and Q' = 

goto(right(Q), a p ) is not empty 
2b [i,k,Q,m,j] ^ [i,k,Q,m,j][i,p- 1, Q',p, k] 

where there is p such that i < p < k and Q' = 

goto(left(Q), a p ) is not empty 
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Figure 1: Generalized HI parsing 



3a [k,Q,m,j} h-+ [k, Q, m,j} [m,p— l,Q',p,j} 

where there is p such that m < p < j and Q' = 

goto(right(Q), a p ) is not empty 
3b [i, k, Q, m] i-> [i, k, Q, m}[i,p- 1, Q',p, k} 

where there is p such that i < p < k and Q' = 

goto(left(Q), a p ) is not empty 
4a [i,k,Q,m, j}[k', j,m'} i-> [i,k,Q',m'} 

provided m = k' , where Q' = gotoright(Q,y) 
4b Symmetric to 4a (cf. 2a and 2b) 
5a [k,Q,m,j}[k',j,m'}t-^[k,t,m'} 

provided m = k' , where t £ gotoright(Q,y) 
5b Symmetric to 5a (cf. 3a and 3b) 

6a [i,k,Q,m,j][k',A—>-j,m']i—^ 
[i,k,Q,m,i}[m,k',Q',m' ,j} 
provided m < k' , where Q' = goto(right(Q), A) 
6b Symmetric to 6a 
7a [k, Q, m, j}[k', A — ► j, m'} >—> 
[k,Q,m,j][m,k',Q',m',j] 
provided m < k' , where Q' = goto(right(Q), A) 
7b Symmetric to 7a 

The algorithm above is based on the transformation 
Thead- It is therefore not surprising that this algorithm 
is reminiscent of LR parsing [1] for a transformed gram- 
mar T tW o{G). For most clauses, a rough correspondence 
with actions of LR parsing can be found: Clauses 2 
and 3 correspond with shifts. Clause 5 corresponds 
with reductions with rules of the form [la] — ► X [a] 
in Tt W0 {G). Clauses 6 and 7 correspond with reduc- 
tions with rules of the form A — ► X [a] in T tW0 (G). For 
Clauses 1 and 4, corresponding actions are hard to find, 
since these clauses seem to be specific to generalized 
head-driven parsing. 

The reason that we based Algorithm 6 on Thead is 
twofold. Firstly, the algorithm above is more appro- 
priate for presentational purposes than an alternative 



algorithm we have in mind which is not based on Thead, 
and secondly, the resulting parsers need less sets Q. 
This is similar in the case of LR parsing. 5 

Example 1 Consider the generalized head grammar 
with the following rules: 



S 
A 
B 



((c)A(b)) S | (A(d)) S | (B) S 



A(b) 



Assume the input is given by 01020304 = c a b s. The 
steps performed by the algorithm are given in Figure 1. 

□ 

Apart from HI parsing, also TD, HC, PHI, and EHI 
parsing can be adapted to generalized head-driven pars- 
ing. 

Correctness 

The head-driven stack automata studied so far differ 
from one another in their degree of nondeterminism. 
In this section we take a different perspective. For all 
these devices, we show that quite similar relations ex- 
ist between stack contents and the way input strings 
are visited. Correctness results easily follow from such 
characterisations. (Proofs of statements in this section 
are omitted for reasons of space.) 

Let G = (N , T, P, S) be a head grammar. To be used 
below, we introduce a special kind of derivation. 



5 It is interesting to compare LR parsing for a context-free 
grammar G with LR parsing for the transformed grammar 
Ttwo(G). The transformation has the effect that a reduc- 
tion with a rule is replaced by a cascade of reductions with 
smaller rules; apart from this, the transformation does not 
affect the global run-time behaviour of LR parsing. More 
serious are the consequences for the size of the parser: the 
required number of LR states for the transformed grammar 
is smaller [9]. 




^3,0 X 3,l ^3,1 x 3,2 ^3,2 x 3,3 ^3,3 



Figure 2: A head-outward sentential form derived by 
the composition of c-derivations pi, 1 < % ' < 3. The 
starting place of each cr-derivation is indicated, each 
triangle representing the application of a single produc- 
tion. 



Definition 1 A cr-derivation has the form 

A — > 70-B71 

j ar]l3ji 
^ j ax/3ji, (1) 

where pi,p2, ■ ■ ■ ,p s are productions in ft, s > 1, pi 
rewrites the unique nonterminal occurrence introduced 
as the head element of pi-\ for 2 < i < s, p s = (B — ► 
ary/3) awrf p £ P* rewrites r\ into * £ T + . 

The indicated occurrence of string ry in (1) is called the 
handle of the cr-derivation. When defined, the right- 
most (leftmost) nonterminal occurrence in a (/?, re- 
spectively) is said to be adjacent to the handle. The 
notions of handle and adjacent nonterminal occurrence 
extend in an obvious way to derivations of the form 
<t>A0 ^ Ho xji6, where A —> 70*71 is a cr-derivation. 

By composing cr-derivations, we can now define the 
class of sentential forms we are interested in. (Figure 2 
shows a case example.) 

Definition 2 A head-outward sentential form is ob- 
tained through a derivation 

S —> 71,0*1,171,1 

^ 72,0*2,172,1*2,272,2 

PjL > 7q,0Xq,l7q,lXq,27q,2 ■ ■ -Jq.q-lXq.qJq.q (2) 

where q > 1, each pi is a a-derivation and, for 2 < i < 
q, only one string Ji-ij is rewritten by applying pi at a 
nonterminal occurrence adjacent to the handle of pi-\. 

Sequence p\, P2, ■ ■ ■ , p q is said to derive the sentential 
form in (2). 

The definition of head-outward sentential form sug- 
gests a corresponding notion of head-outward deriva- 
tion. Informally, a head-outward derivation proceeds by 
recursively expanding to a terminal string first the head 



of a rule, and then the remaining members of the rhs, 
in an outward order. Conversely, we have head-inward 
(HI) derivations, where first the remaining members 
in the rhs are expanded, in an inward order (toward 
the head), after which the head itself is recursively ex- 
panded. Note that HI parsing recognizes a string by 
computing an HI derivation in reverse (cf. LR parsing). 

Let w = a\02 ■ ■ ■ a n , n > 1, be a string over T and let 
do = -L. For —1 < i < j < n, we write (i,j] w to denote 
substring a;+i 

Theorem 1 For A one of A HC , A PHI or A EH1 , the 
following facts are equivalent: 

(i) A reaches a configuration whose stack contents are 
hh • • • Iq, 1 > 1, with 

It = [it ,k t ,A t —>a t »r) t »l3 t ,m t , j t ] or 

It = [it,k t ,A t —nr) t ,m t ,j t ]or 
It = [it,k t ,A t —>r) t ,m t ,j t ] 

for the respective automata, 1 < t < q; 

(ii) a sequence of a -derivations p\ , P2, ■ ■ ■ , p q , q > 1, de- 
rives a head-outward sentential form 

jo(K(i), »Mi)]» 7i(&tt(2), m w{ 2)} wj2''' 

wjq 

where tt is a permutation of {1, . . . , q}, p t has han- 
dle rjt which derives (^((j.mj^Ju,, 1 < t < q, and 
m jr(t _i) < k^t), ^<t<q. 

As an example, an accepting stack configuration 
[—1,-1,5" —> • -L5 1 •,n,n] corresponds to a cr- 
derivation (5" —> -LS)p, p £ P + , with handle 
-LS which derives the head-outward sentential form 
7o(— ^,n] w ji = J-W, from which the correctness of the 
head-corner algorithm follows directly. 

If we assume that G does not contain any useless sym- 
bols, then Theorem 1 has the following consequence. If 
the automaton at some point has consulted the sym- 
bols cii 1 , cii 2 , . . . , a,i m from the input string, i\, . . . , i m 
increasing indexes, then there is a string in the language 
generated by G of the form i>o cii 1 vi ■ ■ ■ i> m _ia 8 ' m i> m . 
Such a statement may be called correct subsequence 
property (a generalization of correct prefix property [8]). 
Note that the order in which the input symbols are con- 
sulted is only implicit in Theorem 1 (the permutation 
7r) but is severely restricted by the definition of head- 
outward sentential form. A more careful characterisa- 
tion can be obtained, but will take us outside of the 
scope of this paper. 

The correct subsequence property is enforced by the 
(top-down) predictive feature of the automata, and 
holds also for A TD and A HI . Characterisations simi- 
lar to Theorem 1 can be provided for these devices. We 
investigate below the CHI automaton. 

For an item I £ J GHI of the form [i,k,Q,m,j], 
[k,Q,m, j], [i,k,Q,m] or [k,t,m], we say that k (m 
respectively) is its left (right) component. Let N' be 



the set of nonterminals of the head grammar Th ea d(G). 
We need a function yld from reachable items in J GHI 
into (N' U T)* , specified as follows. If we assume 
that (a)X(j3) £ Q V A -> (a)X(j3) £ Q and t = 
(a)X(/3) Wt = A-> (a)X(/3), then 



yid(/) 



X if I = [i, fc, Q, m, j] 

if 1= [k,Q,m,j] 
X[j3] if 7 = [i,k,Q,m] 
[a]X[f3] if I = [k,t,m] 



It is not difficult to show that the definition of yld is 
consistent (i.e. the particular choice of a tree or rule 
from Q is irrelevant). 

Theorem 2 The following facts are equivalent: 

(i) A GHI reaches a configuration whose stack contents 
are I\I'i ■ ■ ■ I q , q > 1, with k t and m-t the left and right 
components, respectively, of If, and yld(7 t ) = rjt, for 

1 < t < q; 

(ii) a sequence of a -derivations p\ , p 2 , ■ ■ ■ , p q , q > 1, de- 
rives in Thead(G) a head-outward sentential form 

7o(fc jr(1) , m jr(1) ] w j 1 (k jr(2) , m jr(2 )] wj2 ' ' ' 

wj q 

where tt is a permutation of {1, . . . , q}, p t has han- 
dle rjt which derives (^((j.mj^Ju,, 1 < t < q, and 
m jr(t _ 1 ) < k^t), ^<t<q. 

Discussion 

We have presented a family of head-driven algorithms: 
TD, HC, PHI, EHI, and HI parsing. The existence of 
this family demonstrates that head-driven parsing cov- 
ers a range of parsing algorithms wider than commonly 
thought. 

The algorithms in this family are increasingly deter- 
ministic, which means that the search trees have a de- 
creasing size, and therefore simple realizations, such as 
backtracking, are increasingly efficient. 

However, similar to the left-to-right case, this does 
not necessarily hold for tabular realizations of these al- 
gorithms. The reason is that the more refined an al- 
gorithm is, the more items represent computation of 
a single sub derivation, and therefore some sub deriva- 
tions may be computed more than once. This is called 
redundancy. Redundancy has been investigated for the 
left-to-right case in [8], which solves this problem for 
ELR parsing. Head-driven algorithms have an addi- 
tional source of redundancy, which has been solved for 
tabular HC parsing in [14]. The idea from [14] can also 
be applied to the other head-driven algorithms from 
this paper. 

We have further proposed a generalization of head- 
driven parsing, and we have shown an example of 
such an algorithm based on LR parsing. Prospects to 
even further generalize the ideas from this paper seem 
promising. 
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